Week 4 Worksheet
Learning outcomes
By the end of the session, you should be familiar with:
- running simple and multiple linear regression in JASP
- performing a correlation analysis in JASP
- model building in JASP
- the interpretation of linear regression coefficients
Intro
We continue where we left off last week, taking further Week 3 Worksheet - Exercise 2 in which we made a scatter plot of inequality by social trust using the Trust & Inequality (trust_inequality.dta) dataset, which can be downloaded from https://cgmoreh.github.io/SOC2069-QUANT/Data/.
In that exercise we simplified the default output by removing the univariate distributions of the variables displayed on the margins and the regression line cutting through the plot. Now, however, we will focus on understanding what that “regression line” is actually telling us.
In later exercises we apply the same techniques to replicate a small part of the regression model reported in Österman (2021) (specifically, model (1) in the summary Table 3, which is presented in more detail in Table A.3 in the Online Supplementary Material accompanying the article)
Finally - probably outside class - you should practice the same linear regression modelling techniques on one of the assignment datasets and research questions.
Exercise 1: From a regression line to regression coefficients
If you haven’t yet downloaded it last week, download the Trust & Inequality (trust_inequality.dta) dataset from https://cgmoreh.github.io/SOC2069-QUANT/Data/
Task 1.1: Visualise the relationship
As a first step, create a scatter plot visualising the “relationship” (co-variation, joint distribution, …) between social trust (trust_pct) and inequality (inequality_s80s20). This is Exercise 2 from Week 3 - if you need a reminder of how to do it, check Week 3 Worksheet - Exercise 2 or your saved .jasp file containing your workshop analysis from Week 3.
Task 1.2: Model the relationship
Now let’s dig deeper into the meaning of the regression line by building a simple bivariate linear regression model of social trust as a function of societal inequality (i.e. a model aiming to explain/predict values of social trust in various countries depending on the value of societal inequality in those countries).
To build a linear regression model in JASP, click through the Menu tabs:
\[ \text{Regression} \longrightarrow \text{[Classical] Linear regression} \] In the Linear regression panel, move the “social trust” variable to the \(\text{Dependent Variable}\) box and the “inequality” variable to the \(\text{Covariates}\) box.
The results from the linear regression model will appear in the outputs window on the right.
Task 1.3: Interpret the regression model output
Questions
- Using the lecture slides and Chapter 7 (“Linear regression with a single predictor”) from the Introduction to Modern Statistics (IMS), interpret the meaning of the regression coefficient on “inequality”.
- Add a note on the JASP output under the \(\text{Coefficients}\) output and write down your interpretation there. [Tip: You’ve already practiced adding notes to the outputs in Week 2, Exercise 3, Point 7]
- Where can you find the coefficient of correlation (\(R\)) in the outputs? What about the coefficient of determination (\(R^2\))?
Task 1.4: Find the correlation coefficient using a “correlation” test instead
To run a simple bivariate correlation analysis in JASP, go through the Menu tabs:
\[ \text{Regression} \longrightarrow \text{[Classical] Correlation} \] Move both of the variables of interest to the \(\text{Variables}\) box.
Check if the results are the same as those obtained using linear regression
Exercise 2: Linear regression with categorical predictors
Now we will build another simple bivariate regression model, but this time we will use the variable Region to model/explain/predict levels of “social trust” in different countries. Region is the only Nominal categorical variable in this dataset, and categorical variables behave differently in regression models.
Task 2.1: Describe the Region variable using a Frequency table
Tip: You have done this a few times in previous workshops. Check back on previous exercises if you need to remind yourself of how to create a frequency table.
Task 2.2: Build a simple bivariate regression model
The steps for fitting the regression, however, are very similar to what we have done in the previous exercise:
- Click through the Menu tabs:
\[ \text{Regression} \longrightarrow \text{[Classical] Linear regression} \]
- In the Linear regression panel, move the “social trust” variable to the \(\text{Dependent Variable}\) box
- BUT THIS TIME, we will move the
Regionvariable to the \(\text{Factors}\) box instead.
This will tell JASP that the Region variable is categorical and it should model it as such, treating each of its constituent categories as an individual factor/indicator variable, automatically leaving out the first category (Task 2.1 above will tell you which one that is!) from the model so that the left out category becomes the baseline/reference to which the coefficients on all the other categories compare. What happens here is that the left out category is absorbed into the “Intercept” (the unknown/unmeasured variation in the dependent variable).
The results from the linear regression model will appear in the outputs window on the right.
Questions
- Using the lecture slides and the assigned readings from Introduction to Modern Statistics (IMS), interpret the meaning of the regression coefficients on each reported level of the
Regionvariable; - Which one is the “reference”/“baseline” category?
- Add a note on the JASP output under the \(\text{Coefficients}\) output and write down your interpretation there. [Tip: You’ve already practiced adding notes to the outputs in Week 2, Exercise 3, Point 7]
- Where can you find the coefficient of correlation (\(R\)) in the outputs? What about the coefficient of determination (\(R^2\))? Are they meaningful in this context? Why so, or why not?
Exercise 3: Build a multiple regression model
We can now combine the separate bivariate analyses in the previous two exercises into a more elaborate multiple regression model. The procedure to build a multiple regression model is the same as in the simple regression models before, but this time we add both of the independent variables into the model:
\[ \text{Regression} \longrightarrow \text{[Classical] Linear regression} \]
- In the Linear regression panel, move the “social trust” variable to the \(\text{Dependent Variable}\) box
- Move the “inequality” variable to the \(\text{Covariates}\) box
- Move the
Regionvariable to the \(\text{Factors}\) box
The results will appear in the outputs window on the right. We now have a statistical model which explains variation in “social trust” not only dependent on “inequality”, but also on “Region”. Put differently - if our main aim is to estimate how “inequality” is associated with “social trust” - we have obtained a more accurate estimate of the association between “inequality” and “social trust”, while also accounting for variation due to differences in the Region to which countries belong.
Questions
- Using the lecture slides and the assigned readings from Introduction to Modern Statistics (IMS), interpret the meaning of each regression coefficient, comparing them with the ones obtained from the simpler models in the previous exercises;
- Add a note on the JASP output under the \(\text{Coefficients}\) output and write down your interpretations there.
- Where can you find the coefficient of correlation (\(R\)) in the outputs? What about the coefficient of determination (\(R^2\))? Are they meaningful in this context? Why so, or why not?